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1. Introduction 

The various No Free Lunch theorems are important theoretical results, indi- 
cating that no 'black-box' problem solver can be expected to achieve better than 
random performance without any information about the problem it is to be applied 
to. Such results have been proved for off training-set generalisation in supervised 
learning |TT], and for search and optimisation [SHI]- It is known that a crucial 
assumptions of the No Free Lunch theorems is that a (block) uniform distribution 
holds over the set of problems, or objective functions, under consideration and that 
this set be closed- under-permuation (c.u.p.). The c.u.p. requirement means that 
any permutation of any objective function in the problem set under consideration, 
through changing the mapping between elements of the search space and objective 
values, results in another objective function that is also a member of that problem 
set. However, it has been shown that realistic problem sets are highly unlikely to 
satisfy the c.u.p. assumption [5]. As a result, a large field of increasingly sophis- 
ticated work has developed seeking to prove that 'Almost No Free Lunch' results 
that are nearly as strong as the original No Free Lunch results, typically based 
on information theoretic considerations, still apply in realistic problem scenarios 

(e,g, Bona). 

In this paper we take a simplifying step back in an attempt to cut the increasingly 
complicated 'Gordian knot' that (Almost) No Free Lunch research has come to 
represent. Our approach is simple and intuitive, yet allows general results easily to 
be arrived at. We begin by examining the implications of the search version of the 
No Free Lunch theorem for real-world search algorithms, showing that revisiting 
algorithms also break the permutation closure condition required for the No Free 
Lunch theorem to hold. Thus allowing realistic, revisiting algorithms means there 
can be some best algorithm; the pertinent question then is whether we can identify 
this best algorithm, and it turns out that we can indeed. To answer this question 
we go on to present a novel analysis of search algorithms as stochastic processes, 
enabling us to present a No Free Lunch-like result that holds for arbitrary sets 
of problems, and for realistic algorithms that do not avoid revisiting points in 
the search space. Specifically we show that random enumeration has the best 
expected performance of all search algorithms applied to optimisation problems, 
for any distribution over any problem set. The implication of this is that empirical 
demonstration of superior search performance (relative to enumeration) for some 
algorithm on some problem set still predicts inferior performance on some second 
problem set if we know nothing about its relationship with the first. We thus 'cut 
the Gordian knot' by simplifying the assumptions underlying the No Free Lunch 
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theorem, and show why violations of its assumptions are unimportant for real- world 
search algorithms and problems. 

1.1. The Sharpened No Free Lunch Theorem. We begin by informally sum- 
marising the Sharpened No Free Lunch theorem. No Free Lunch arguments typi- 
cally consider a search space X, a set of possible objective values y, and objective 
functions of the form / : X h-> y. The set of all possible objective functions is 
then denoted J- = y . The original No Free Lunch theorems prove that, across 
all possible objective functions (or problems) J 7 , all non-revisiting algorithms have 
equivalent performance under an arbitrary performance measure [61112] . Algorithms 
are defined here as pseudo-random processes choosing previously unvisited points 
to visit based on the quality of prior visited points. Subsequently it was shown 
that the No Free Lunch theorem only holds if the set of objective functions under 
consideration is closed under permutation (c.u.p.). 



Theorem 1 (Sharpened No Free Lunch Theorem). All non-revisiting algorithms 
have equivalent performance over a set of objective functions F, under some arbi- 
trary performance measure, iff F is closed under permutation. 

Theorem [T] has subsequently been extended to also hold for 'block- uniform' dis- 
tributions over subsets that are c.u.p. [5] [?]■ This condition for the No Free Lunch 
theorem to hold is satisfied by the original requirement in j!2) of a uniform distri- 
bution over T , because the uniform distribution is a special case of a block-uniform 
distribution. Subsequently it has been shown that permutation closure is very un- 
likely to be satisfied in realistic scenarios [5] , leading many researchers to conclude 
that No Free Lunch results have little consequence for practical applications of 
search algorithms. 



We now examine one of the fundamental assumptions in the proof of the No 
Free Lunch result, that the algorithms used are non-revisiting in the solution space. 
That is, the algorithms visit every point in the solution space exactly once. It has 
been remarked previously that such algorithms are impractical due to the time and 
space cost of storing and querying the set of visited points |B] , although this is not 
strictly correct. It has also been suggested that revisiting algorithms, in the form 
of algorithms having a degree of redundancy in their solution encoding, have much 
to recommend them [3]. 

If we relax the assumption of non-revisiting algorithms, we can demonstrate the 
following result. 

Theorem 2 (Revisiting Breaks Permutation Closure). A revisiting search over a 
given search space under a given c.u.p. set of objective functions can be formally 
expressed as a non-revisiting search over some larger search space under a set of 
objective functions that is not c.u.p., for any set containing only objective functions 
mapping to more than one element. 

Proof: Searching under a c.u.p. set of objective functions F with an algorithm 
A which revisits some points x% € X is equivalent to a non-revisiting search of a 
new set of extended objective functions F' on a larger space X' , which contains i 
additional points x\ under the constraint that Vi : f(xi) = /(a^). 

For F' to be closed-under-permutation it must definitely be larger than F, as 
it contains functions with the same codomain (because the search is presumed to 
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be eventually exhaustive) on the larger domain X' , and increasing the size of the 
domain necessarily increases the number of possible permutations. However, by 
construction \F\ — \F'\, and so F' cannot be closed- under-permutation. □ 

Note that this theorem holds regardless of the degree of revisiting, so algorithms 
which revisit all points in the search space with equal frequency also effectively 
break permutation closure of the set of objective functions. 

We discuss the implications of this theorem for previous No Free Lunch results 
on encoding redundancy [3] in section |4l For now we briefly note the following 
obvious but simplifying result: 

Corollary 1 (Free Lunches). All violations of the No Free Lunch theorems can be 
expressed as non-block-uniform distributions over problem subsets that are closed 
under permutation. 

Proof: For uniform distributions over c.u.p. sets this follows directly from the- 
orem [5] and the observation that non-c.u.p. sets are special cases of non-uniform 
distributions over c.u.p. sets. The extension to block-uniform distributions is by 
applying theorem [2] to each c.u.p. subset. □ 

It is interesting to note that while violation of permutation closure leads to search 
algorithms potentially having different performance, this does not tell us to what 
extent algorithms might differ in performance. If induced performance differences 
are small, then searching for a superior algorithm for some problem set could be 
like looking for a 'needle in a haystack'. Here we present a first approach to quan- 
tifying the extent to which revisiting algorithms differ in performance, according 
to the amount of revisiting they allow. That violation of permutation closure leads 
to algorithms potentially having different performance has been known since [MB]. 
However, the strength of the Sharpened NFL theorem makes its contrapositive cor- 
respondingly weak; if F is not c.u.p., then there exists at least one performance 
measure under which at least one pair of algorithms have differing performance on 
F. This leads to the relevant question of how likely two randomly selected algo- 
rithms are to have different performance for a given non-c.u.p. problem set, which 
has not previously been addressed. If most algorithms have identical performance 
on a given problem set, then looking for an algorithm with superior performance 
may be like looking for a needle in a haystack. If, however, almost all algorithms 
have different performance then empirical comparison of different algorithms be- 
comes sensible. In fact the probability of two arbitrarily chosen algorithms having 
a different set of traces over some problem set, and hence potentially different per- 
formance under some appropriate performance measure, grows super-polynomially 
with increased frequency of revisiting, and so we can make a stronger claim than 
the contrapositive of the Sharpened No Free Lunch theorem provides: when revis- 
iting is allowed, performance differences can be expected. The proof for this claim 
follows: 

Theorem 3. Let A be an algorithm which performs a revisiting search on a space 
X and revisits r points before exhausting X. The probability that A and a similar 
algorithm B are indistinguishable in performance decreases super-polynomially as 
the amount of revisiting increases, except where all elements of the search space are 
mapped to the same objective value. 

Proof: Let F'* be the permutation closure of F', and F = f* where /* is 
the permutation orbit of function /. Let C C y be the codomain of /, and let 
Xi be the number of points in X which / maps to i G C (A is the histogram of 
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/ [5]). Searching a set of functions takes each function to a trace, so the size of an 
algorithm's trace set is \F \ = \F\ = jy — a~T- Since F 1S n °t c.u.p. this set of traces 
will be a small fraction of the traces for F'* ; we must make some assumptions about 
the behaviour of our algorithm to find \F'*\. Consider the case where the algorithm 
revisits points assigned to the most common objective value, j s C, r times during 
the search - the algorithm may do this by, for example, enumerating X until it 
knows j and then revisiting a point mapped to it r times - this simplistic revisiting 
algorithm is easy to analyse, and upper bounds the probability of distinguishability 
by ensuring that every function in F' has the same histogram and minimising the 
size of its permutation closure. Knowing that all the functions in F 1 have the same 
histogram, constructed by distributing the r revisits amongst the i bins of A, the 
upper bound follows from the observation that \\, A'J is maximised by allocating 
all r revisits to the largest bin in A (this claim can be shown by a simple induction) . 

Since every function in F' has the same histogram, in which Vx ^ j : X' x = X x 
and X'j = Xj + r, \F'*\ — j-j — ^x^+[i=j]-r)i ■ ^ ne ratio p = is then the fraction 
of the set of all possible traces which a given algorithm will produce. Observing 
that a uniformly randomly chosen algorithm searching a given objective function 
gives a uniform distribution over the traces for that function, we see that for an 
algorithm A searching F' an arbitrarily chosen algorithm B shares a trace with 
A with probability p, and so p bounds from above the probability that A and a 
random B are indistinguishable. 

Some manipulation gives that p — ^ J/^^, • ^p-, from which we can show 

\np = XI [=1 1 R \x\+i - We can upper-bound this with an integral, which gives us 

_ \F^\_ (r-l + Aj)( r - 1+A ^ x \X\\*\ 

~ W\ < Aj j x (r-l + I^D^-i+I^D ' 

from which we can see that l/p(r) is at least superpolynomial in r (excepting 
the trivial case where A^ = \X\). □ 

Thus the probability that two algorithms selected uniformly at random are dis- 
tinguishable increases very quickly as the extent to which the algorithms revisit 
increases. The expected number of points needing to be evaluated before a differ- 
ence is found, and hence the computational complexity of detecting differences in 
algorithms' performance, is an interesting question that is outside the scope of the 
current paper. Furthermore, this approach will still not enable us to reason about 
relative performance of algorithms. 

Returning to the main argument of the paper, we will next propose an alternative 
way of reasoning about revisiting algorithms and their expected performance over 
arbitrary sets of possible objective functions. This approach will indeed allow us 
to reason about algorithms' relative performance. 

3. Realistic Performance Measures and Random Search Algorithms 

Proving No Free Lunch for arbitrary performance measures is a powerful result, 
based on an unrealistic assumption; that search algorithms are described by ex- 
haustive non-revisiting enumerations of the search space and, equivalently that the 
set of objective functions considered be closed under permutation. 

For the remainder of this paper we change our viewpoint from regarding search 
algorithms as deterministic exhaustive, possibly repeating, searches, to regarding 
them as stochastic processes. We also change from considering arbitrary perfor- 
mance measures, to considering performance measures representative of the goals 
of search and optimisation. These two changes in perspective will enable us to 
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reason about the expected performance of realistic search algorithms, and make 
concrete recommendations to the practitioner. 

First we dchne a sensible performance measure. As mentioned above, showing 
that No Free Lunch does not hold does not enable us to reason usefully about rela- 
tive performance of different algorithms. We therefore define a sensible performance 
measure that reflects the goal of search and optimisation algorithms: finding good 
points in the search space. Concentrating on a particular performance measure will 
allow us to make definitive observations and recommendations on the relative per- 
formance of different search algorithms. This would not be possible if we allowed 
ourselves arbitrary performance measures (since an arbitrary performance measure 
is not required to distinguish between algorithms simply because some performance 
measure could do so). 

Definition 1 (Sensible performance measures). Let S(A, /, n) be the set of distinct 
values of objective function f observed by algorithm A after n points have been 
evaluated. Then a sensible performance measure M , which we wish to maximise, 
defines the performance of A on f after n points using only S(A, f,n), with the 
additional constraint that S\ C $2 M(S\) < Af (S2) 

Such measures capture the basic criterion against which search algorithms are 
assessed: how good are the solutions they generate? At first, it may appear that 
this performance measure discards the other important criterion of search algorithm 
performance: how long does it take to generate good solutions? However, as we 
shall see time is implicit in this performance measure, as we shall make comparisons 
of searches having equal length. 

We now make the following No Free Lunch-like statement for revisiting algo- 
rithms. 

Proposition 1 (Maximising Sensible Performance). For any distribution over any 
set of objective functions F , a randomly chosen enumeration A can be expected 
to equal or outperform a randomly chosen non-minimally-revisiting algorithm B, 
under any sensible performance measure M at any time t in the search. 

Proof: For any t > \X\ it is clear that a minimally revisiting algorithm has 
maximal performance as the full codomain of the objective will have been observed 
and, from definition [TJ M is required to depend only on the part of the codomain 
which has been seen. 

When t < \X\, A samples t points from X without replacement, whereas B 
samples t points from X with some non-zero probability of replacement (as a con- 
sequence of being non-minimally- revisiting). Consequently the expected number of 
distinct points sampled by A is greater than the expected number sampled by B. 
Similarly the expected size of the set of objective values observed by A exceeds (or 
equals, for functions whose codomain is of size 1) that of the set observed by B. 
Call these two sets of objective values Sa and Sb- 

Sa and Sb are subsets drawn randomly from the codomain of the objective, 
and the only effect B can have on the distribution of values drawn is to reduce 
its range, so that \Sb\ < \Sa\- For a randomly chosen set S of objective values, 
E(M(S)) increases with \S\ (this is clear from the definition of a sensible measure), 
and so E(M(Sa)) > E(M(Sb))- This holds for any individual objective function, 
the extension to arbitrary distributions over arbitrary sets being by linearity of 
expectation. □ 
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It is sometimes remarked that No Free Lunch approaches typically ignore the 
time and space complexity of the algorithms, therefore it is interesting at this 
point to note that enumeration can typically be implemented with excellent time 
and space complexity; 0{log\X\) time for each point queriecQ and 0(log l^j) space 
overall, when solutions can be represented as finite strings. No algorithm with 
lower space complexity can have better performance in the NFL sense (it would 
necessarily revisit, as it could not have a state for each point in which it would 
deterministically query that point), and only algorithms with better performance 
in the NFL sense can optimise faster in wall-clock time. This further indicates 
enumeration's pre-eminent position as the general search algorithm, and motivates 
its use as a benchmark by practitioners. 

An interesting consequence of proposition [1] is 

Corollary 2 (Random Search Performance) . Random search algorithms, those that 
ignore the outcome of their search in selecting subsequent search points, can differ 
in performance. 

Proof: This follows directly from propositionQ]and observing that random search 
algorithms, as defined above, include both blind enumeration, and revisiting algo- 
rithms. □ 

The definition of random search algorithms as being those that ignore the ob- 
jective values generated during the search was proposed by [12] • Corollary [2] is 
interesting, as the fundamental prediction of the No Free Lunch result has been 
stated as being that "if an algorithm performs better than random search on some 
class of problems then it must perform worse than random search on the remain- 
ing problems" (emphasis authors') [T2]. Corollary [5] shows that the property of 
'randomness' is actually not of great importance for No Free Lunch-like statements 
regarding realistic search algorithms. 

Finally, an important result emerges as a direct consequence of proposition [T] 

Corollary 3 (Performance Prediction). Empirically- demonstrated better-than- enumeration 
performance of some algorithm on some problem set predicts worse-than-enumeration 
performance on any other disjoint problem set whose relationship to the first is un- 
known. 

Proof: For some algorithm A let us denote expected deviation from the expected 
performance of enumeration on a problem set ip as E(Aa(iP))- Then, from proposi- 
tion[TJ E(A cnum (ip)) = for any ip, and similarly for any non-minimally-revisiting 
algorithm A whose performance on ip we know nothing about, E(Aa{iP)) < 0. If we 
have an unknown problem set <p whose relationship with set ip is unknown then, by 
definition, .E(Aa(<^)|V0 = E(Aa(4>)), which must also be negative by proposition 
□ □ 

A related but different result based on considerations of function complexity has 
previously been presented [T], which demonstrates that for any function where an 
algorithm performs well there is another function of similar complexity on which 
it performs badly. Our result is both simpler and more generally applicable, in 
that it makes no reference to the details of the algorithms or the complexity of the 
functions involved, but weaker in its predictions, in that it predicts no correlation 
between performances on the two problem sets, rather than a negative relation. 



The time taken to exhaust the search space is thus 0(| A*| ) , which also cannot be improved upon. 



beyond no free lunch: realistic algorithms for arbitrary problem classes 

4. Encoding Redundancy, Neutrality and Revisiting 

In No Free Lunch contexts, revisiting by an algorithm might occur through the 
decision not to keep track of search points visited so far. An additional cause 
for revisiting is encoding redundancy. We can represent encoding in a No Free 
Lunch context by considering an additional set W of encoded solutions, which are 
decoded into their corresponding points in the search space by a 'growth' function 
g : W I— > X [6 ]. If this mapping is not injective, then we have a redundant encoding. 
Such redundancy is often found in heuristic search algorithms, and will turn a non- 
revisiting algorithm in the space of encoded solutions into a revisiting algorithm 
in the true solution space, so that the No Free Lunch theorem no longer holds. 
Radcliffe & Surry may not have noticed this fact, as their method of proving the No 
Free Lunch result relies on setting the search space W and solution space X to be the 
same, and the growth function g to be the identity mapping. It has previously been 
suggested in [13] that a non- uniform (i.e. biased) encoding redundancy would break 
the No Free Lunch theorem. Here we have demonstrated that in fact non-uniformity 
is not the key requirement for breaking No Free Lunch, encoding redundancy alone 
is sufficient. 

Previously Igel and Toussaint have presented an analysis of encoding redun- 
dancy, or neutrality, in No Free Lunch situations, arguing that under encoding 
redundancy all algorithms still have equal expected performance [3J. In fact the 
No Free Lunch theorem does not hold once redundancy is introduced for, as shown 
above, redundant and hence revisiting algorithms break permutation closure of any 
set of objective functions. Thus care must be taken in interpreting these results; the 
assumption of theorem 1 in 3j is violated and hence the subsequent claim that the 
time to find an optimum averaged over all functions is the same for all algorithms 
when redundancy is allowed is incorrect. Igel and Toussaint actually analyse the 
difficulty of search problems in terms of the proportion of distinct solutions (mem- 
bers of X) mapped to the same global optimum (member of set y); this is not 
the same thing as considering multiple encoded solutions (members of W) mapped 
to the same actual solution (member of set X). As their analysis considers only 
algorithms that are non-revisiting in the solution space the No Free Lunch result is 
still applicable and their analysis valid, however nothing has been said about the 
effects of true encoding neutrality on the performance of search algorithms. When 
encoding redundancy is correctly defined as a non-injective mapping between a 
larger representation space and a smaller solution space, the stochastic search re- 
sults presented here become applicable. The observation that encoding redundancy 
in a given problem leads to enumeration having the best expected performance may 
raise questions about the theoretical basis for the claimed benefits of neutrality and 
self-adaptation [3J, which are outside the scope of this paper. 

5. Conclusions 

In this paper we have shown that the requirement for the Sharpened No Free 
Lunch theorem to hold, that the problem set under consideration be closed under 
permutation, is in fact a necessary and sufficient condition for a version of the 
theorem, extended to also admit revisiting algorithms, to hold. It has previously 
been shown that the c.u.p. condition is very unlikely to be satisfied for realistic 
problem classes. We have shown further that realistic, revisiting algorithms also 
violate this condition. That the No Free Lunch theorems' assumptions are typically 
violated has been used many times as an argument that they can reasonably be 
ignored by those working with search algorithms, while a branch of research has 
also grown up showing conditions under which 'Almost' No Free Lunch results hold. 
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If the conditions of No Free Lunch are violated then some algorithm can have 
best expected performance, but this does not help us to find it, or to say anything 
about it. The main contribution of this paper is to propose a statistical analy- 
sis of search algorithms as random processes, demonstrating that algorithms that 
minimise the extent to which they revisit points in the search space have higher 
expected performance. The result of this is the demonstration that for arbitrary 
sets of objective functions, including those which are not closed under permutation 
and over which arbitrary probability distributions hold, enumeration has better 
expected sensible performance than any arbitrary revisiting algorithm. Note that 
arbitrary problem distributions obviously includes realistic problem distributions, 
such as the universal distribution [S]. It is also interesting to note that, through use 
of a string encoding, an algorithm to perform a blind enumeration of a search space 
will typically have excellent algorithmic time and space complexity, in 0(1^1) and 
O (log \X\) respectively. 

This paper may also help guide practice. Theorem [3] should be of great interest 
to those empirically investigating the performance of search algorithms. It does not 
say that superior performance on one problem set will necessarily result in inferior 
performance on another, but does say that if we have no information whatsoever 
about the relationship between two problem sets, performance observed to be bet- 
ter than enumeration on one still predicts performance worse than enumeration on 
the other. Of course, it is rare that one knows nothing at all about the relationship 
between problem sets, even if it is simply that at an intuitive level problems from 
the two sets seem similar in some way. Furthermore as an experimenter collects per- 
formance statistics for their algorithms on the unknown problem set, uncertainty 
over the algorithms' performance on that set will decrease so that they are able 
to predict performance on previously unseen instances with increasing confidence. 
Nevertheless, consider the best policy if you were asked to play the following simple 
game: a third-party gives you an algorithm and a set of performance data for it 
on some problem set, then tells you that you have a choice of whether to apply 
the algorithm to one problem from some other problem set, or to apply a random 
enumerative search. No information whatsoever is given about the new problem 
set, and the aim of this game is to achieve the best performance on the new un- 
seen problem on your first attempt. Theorem [3] indicates that you should choose 
enumeration instead of the known algorithm. Obviously one rarely, if ever, wishes 
to win such a game, and applying a known algorithm to an unknown problem set 
has another payoff in terms of information gained, but hopefully theorem [3l the 
computational simplicity of enumeration, and the game outlined above will all help 
to convince practitioners that an algorithm's empirical performance on some prob- 
lem set should be directly compared against a deterministically enumerative search 
started at some uniformly randomly selected point in the search space or, equiva- 
lently, a uniform random search using sampling without replacement. At present 
such practice is far from common. 
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